{ "cells": [ { "cell_type": "markdown", "id": "a7b4e957", "metadata": {}, "source": [ "(dataset-spec)=\n", "# Quantify dataset specification\n", "\n", "```{seealso}\n", "The complete source code of this tutorial can be found in\n", "\n", "{nb-download}`Quantify dataset - specification.ipynb`\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "135c3519", "metadata": { "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import matplotlib.pyplot as plt\n", "import xarray as xr\n", "from rich import pretty\n", "\n", "import quantify_core.data.dataset_adapters as dadapters\n", "import quantify_core.data.dataset_attrs as dattrs\n", "from quantify_core.data import handling as dh\n", "from quantify_core.utilities import dataset_examples\n", "from quantify_core.utilities.examples_support import round_trip_dataset\n", "from quantify_core.utilities.inspect_utils import display_source_code\n", "\n", "pretty.install()\n", "\n", "dh.set_datadir(Path.home() / \"quantify-data\") # change me!" ] }, { "cell_type": "markdown", "id": "fba7c614", "metadata": {}, "source": [ "This document describes the Quantify dataset specification.\n", "Here we focus on the concepts and terminology specific to the Quantify dataset.\n", "It is based on the Xarray dataset, hence, we assume basic familiarity with the {class}`xarray.Dataset`.\n", "If you are not familiar with it, we highly recommend to first have a look at our {ref}`xarray-intro` for a brief overview.\n", "\n", "(sec-coordinates-and-variables)=\n", "\n", "## Coordinates and Variables\n", "\n", "The Quantify dataset is an xarray dataset that follows certain conventions. We define \"subtypes\" of xarray coordinates and variables:\n", "\n", "(sec-main-coordinates)=\n", "\n", "### Main coordinate(s)\n", "\n", "- Xarray **Coordinates** that have an attribute {attr}`~quantify_core.data.dataset_attrs.QCoordAttrs.is_main_coord` set to `True`.\n", "\n", "- Often correspond to physical coordinates, e.g., a signal frequency or amplitude.\n", "\n", "- Often correspond to quantities set through {class}`.Settable`s.\n", "\n", "- The dataset must have at least one main coordinate.\n", "\n", " > - Example: In some cases, the idea of a coordinate does not apply, however a main coordinate in the dataset is required. A simple \"index\" coordinate should be used, e.g., an array of integers.\n", "\n", "- See also the method {func}`~quantify_core.data.dataset_attrs.get_main_coords`.\n", "\n", "(sec-secondary-coordinates)=\n", "\n", "### Secondary coordinate(s)\n", "\n", "- A ubiquitous example is the coordinates that are used by \"calibration\" points.\n", "- Similar to {ref}`main coordinates `, but intended to serve as the coordinates of {ref}`secondary variables `.\n", "- Xarray **Coordinates** that have an attribute {attr}`~quantify_core.data.dataset_attrs.QCoordAttrs.is_main_coord` set to `False`.\n", "- See also {func}`~quantify_core.data.dataset_attrs.get_secondary_coords`.\n", "\n", "(sec-main-variables)=\n", "\n", "### Main variable(s)\n", "\n", "- Xarray **Variables** that have an attribute {attr}`~quantify_core.data.dataset_attrs.QVarAttrs.is_main_var` set to `True`.\n", "- Often correspond to a physical quantity being measured, e.g., the signal magnitude at a specific frequency measured on a metal contact of a quantum chip.\n", "- Often correspond to quantities returned by {class}`.Gettable`s.\n", "- See also {func}`~quantify_core.data.dataset_attrs.get_main_vars`.\n", "\n", "(sec-secondary-variables)=\n", "\n", "### Secondary variables(s)\n", "\n", "- Again, the ubiquitous example is \"calibration\" datapoints.\n", "- Similar to {ref}`main variables `, but intended to serve as reference data for other main variables (e.g., calibration data).\n", "- Xarray **Variables** that have an attribute {attr}`~quantify_core.data.dataset_attrs.QVarAttrs.is_main_var` set to `False`.\n", "- The \"assignment\" of secondary variables to main variables should be done using {attr}`~quantify_core.data.dataset_attrs.QDatasetAttrs.relationships`.\n", "- See also {func}`~quantify_core.data.dataset_attrs.get_secondary_vars`.\n", "\n", "```{note}\n", "In this document we show exemplary datasets to highlight the details of the Quantify dataset specification.\n", "However, for completeness, we always show a valid Quantify dataset with all the required properties.\n", "```\n", "\n", "In order to follow the rest of this specification more easily have a look at the example below.\n", "It should give you a more concrete feeling of the details that are exposed afterward.\n", "See {ref}`sec-dataset-examples` for an exemplary dataset.\n", "\n", "We use the\n", "{func}`~quantify_core.utilities.dataset_examples.mk_two_qubit_chevron_dataset` to\n", "generate our dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "e7666dae", "metadata": { "mystnb": { "code_prompt_show": "Source code for generating the dataset below" }, "tags": [ "hide-cell" ] }, "outputs": [], "source": [ "display_source_code(dataset_examples.mk_two_qubit_chevron_dataset)" ] }, { "cell_type": "code", "execution_count": null, "id": "055f7110", "metadata": {}, "outputs": [], "source": [ "dataset = dataset_examples.mk_two_qubit_chevron_dataset()\n", "assert dataset == round_trip_dataset(dataset) # confirm read/write" ] }, { "cell_type": "markdown", "id": "47ee7eb0", "metadata": {}, "source": [ "### 2D example\n", "\n", "In the dataset below we have two main coordinates `amp` and `time`; and two main\n", "variables `pop_q0` and `pop_q1`.\n", "Both main coordinates \"lie\" along a single xarray dimension, `main_dim`.\n", "Both main variables lie along two xarray dimensions `main_dim` and `repetitions`." ] }, { "cell_type": "code", "execution_count": null, "id": "40661f89", "metadata": {}, "outputs": [], "source": [ "dataset" ] }, { "cell_type": "markdown", "id": "e1bbbe76", "metadata": {}, "source": [ "**Please note** how the underlying arrays for the coordinates are structured!\n", "Even for \"gridded\" data, the coordinates are arranged in arrays\n", "that match the dimensions of the variables in the xarray. This is\n", "done so that the data can support more complex scenarios, such as\n", "irregularly spaced samples and measurements taken at unknown locations." ] }, { "cell_type": "code", "execution_count": null, "id": "ae230c4f", "metadata": {}, "outputs": [], "source": [ "n_points = 110 # only plot a few points for clarity\n", "_, axs = plt.subplots(4, 1, sharex=True, figsize=(10, 10))\n", "dataset.amp[:n_points].plot(\n", " ax=axs[0], marker=\".\", color=\"C0\", label=dataset.amp.long_name\n", ")\n", "dataset.time[:n_points].plot(\n", " ax=axs[1], marker=\".\", color=\"C1\", label=dataset.time.long_name\n", ")\n", "_ = dataset.pop_q0.sel(repetitions=0)[:n_points].plot(\n", " ax=axs[2], marker=\".\", color=\"C2\", label=dataset.pop_q0.long_name\n", ")\n", "_ = dataset.pop_q1.sel(repetitions=0)[:n_points].plot(\n", " ax=axs[3], marker=\".\", color=\"C3\", label=dataset.pop_q1.long_name\n", ")\n", "for ax in axs:\n", " ax.legend()\n", " ax.grid()" ] }, { "cell_type": "markdown", "id": "447e03e1", "metadata": {}, "source": [ "As seen above, in the Quantify dataset the main coordinates do not explicitly index\n", "the main variables because not all use-cases fit within this paradigm.\n", "However, when possible, the Quantify dataset can be reshaped to take advantage of the\n", "xarray built-in utilities.\n", "\n", "\n", "" ] }, { "cell_type": "code", "execution_count": null, "id": "ddeb8f38", "metadata": {}, "outputs": [], "source": [ "dataset_gridded = dh.to_gridded_dataset(\n", " dataset,\n", " dimension=\"main_dim\",\n", " coords_names=dattrs.get_main_coords(dataset),\n", ")\n", "dataset_gridded.pop_q0.plot.pcolormesh(x=\"amp\", col=\"repetitions\")\n", "_ = dataset_gridded.pop_q1.plot.pcolormesh(x=\"amp\", col=\"repetitions\")" ] }, { "cell_type": "markdown", "id": "e4540870", "metadata": {}, "source": [ "## Dimensions\n", "\n", "The main variables and coordinates present in a Quantify dataset have the following required and optional xarray dimensions:\n", "\n", "### Main dimension(s) \\[Required\\]\n", "\n", "The main dimensions comply with the following:\n", "\n", "- The outermost dimension of any main coordinate/variable, OR the second outermost dimension if the outermost one is a {ref}`repetitions dimension `.\n", "- Do not require to be explicitly specified in any metadata attributes, instead utilities for extracting them are provided. See {func}`~quantify_core.data.dataset_attrs.get_main_dims` which simply applies the rule above while inspecting all the main coordinates and variables present in the dataset.\n", "- The dataset must have at least one main dimension.\n", "\n", "```{admonition} Note on nesting main dimensions\n", "Nesting main dimensions is allowed in principle and such examples are\n", "provided but it should be considered an experimental feature.\n", "\n", "- Intuition: intended primarily for time series, also known as \"time trace\" or simply trace. See {ref}`sec-dataset-t1-traces` for an example.\n", "```\n", "\n", "### Secondary dimension(s) \\[Optional\\]\n", "\n", "Equivalent to the main dimensions but used by the secondary coordinates and variables.\n", "The secondary dimensions comply with the following:\n", "\n", "- The outermost dimension of any secondary coordinate/variable, OR the second outermost dimension if the outermost one is a {ref}`repetitions dimension `.\n", "- Do not require to be explicitly specified in any metadata attributes, instead utilities for extracting them are provided. See {func}`~quantify_core.data.dataset_attrs.get_secondary_dims` which simply applies the rule above while inspecting all the secondary coordinates and variables present in the dataset.\n", "\n", "(sec-repetitions-dimensions)=\n", "### Repetitions dimension(s) \\[Optional\\]\n", "\n", "Repetition dimensions comply with the following:\n", "\n", "- Any dimension that is the outermost dimension of a main or secondary variable when its attribute {attr}`QVarAttrs.has_repetitions ` is set to `True`.\n", "- Intuition for this xarray dimension(s): the equivalent would be to have `dataset_reptition_0.hdf5`, `dataset_reptition_1.hdf5`, etc. where each dataset was obtained from repeating exactly the same experiment. Instead we define an outer dimension for this.\n", "- Default behavior of (live) plotting and analysis tools can be to average the main variables along the repetitions dimension(s).\n", "- Can be the outermost dimension of the main (and secondary) variables.\n", "- Variables can lie along one (and only one) repetitions outermost dimension.\n", "\n", "#### Example datasets with repetition\n", "\n", "As shown in the {ref}`xarray-intro` an xarray dimension can be indexed by a `coordinate` variable. In this example the `repetitions` dimension is indexed by the `repetitions` xarray coordinate. Note that in an xarray dataset, a dimension and a data variable or a coordinate can share the same name. This might be confusing at first. It takes just a bit of dataset manipulation practice to gain an intuition for how it works." ] }, { "cell_type": "code", "execution_count": null, "id": "db7e9161", "metadata": {}, "outputs": [], "source": [ "coord_dims = (\"repetitions\",)\n", "coord_values = [\"A\", \"B\", \"C\", \"D\", \"E\"]\n", "dataset_indexed_rep = xr.Dataset(coords=dict(repetitions=(coord_dims, coord_values)))\n", "\n", "dataset_indexed_rep" ] }, { "cell_type": "code", "execution_count": null, "id": "db12dae4", "metadata": {}, "outputs": [], "source": [ "# merge with the previous dataset\n", "dataset_rep = dataset.merge(dataset_indexed_rep, combine_attrs=\"drop_conflicts\")\n", "\n", "assert dataset_rep == round_trip_dataset(dataset_rep) # confirm read/write\n", "\n", "dataset_rep" ] }, { "cell_type": "markdown", "id": "4e009fb1", "metadata": {}, "source": [ "And as before, we can reshape the dataset to take advantage of the xarray built-in utilities." ] }, { "cell_type": "code", "execution_count": null, "id": "cb737344", "metadata": {}, "outputs": [], "source": [ "dataset_gridded = dh.to_gridded_dataset(\n", " dataset_rep,\n", " dimension=\"main_dim\",\n", " coords_names=dattrs.get_main_coords(dataset),\n", ")\n", "dataset_gridded" ] }, { "cell_type": "markdown", "id": "defd4a70", "metadata": {}, "source": [ "It is now possible to retrieve (select) a specific entry along the `repetitions` dimension:" ] }, { "cell_type": "code", "execution_count": null, "id": "b5100ad2", "metadata": {}, "outputs": [], "source": [ "_ = dataset_gridded.pop_q0.sel(repetitions=\"A\").plot(x=\"amp\")\n", "plt.show()\n", "_ = dataset_gridded.pop_q0.sel(repetitions=\"D\").plot(x=\"amp\")" ] }, { "cell_type": "markdown", "id": "c674ba9d", "metadata": {}, "source": [ "## Dataset attributes\n", "\n", "The required attributes of the Quantify dataset are defined by the following dataclass.\n", "It can be used to generate a default dictionary that is attached to a dataset under the {attr}`xarray.Dataset.attrs` attribute.\n", "\n", "```{eval-rst}\n", ".. autoclass:: quantify_core.data.dataset_attrs.QDatasetAttrs\n", " :members:\n", " :noindex:\n", " :show-inheritance:\n", "```\n", "\n", "Additionally in order to express relationships between coordinates and/or variables\n", "the following template is provided:\n", "\n", "```{eval-rst}\n", ".. autoclass:: quantify_core.data.dataset_attrs.QDatasetIntraRelationship\n", " :members:\n", " :noindex:\n", " :show-inheritance:\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "cdc5044e", "metadata": {}, "outputs": [], "source": [ "from quantify_core.data.dataset_attrs import QDatasetAttrs\n", "\n", "# tip: to_json and from_dict, from_json are also available\n", "dataset.attrs = QDatasetAttrs().to_dict()\n", "dataset.attrs" ] }, { "cell_type": "markdown", "id": "bff4a6c6", "metadata": {}, "source": [ "Note that xarray automatically provides the entries of the dataset attributes as python attributes. And similarly for the xarray coordinates and data variables." ] }, { "cell_type": "code", "execution_count": null, "id": "dbaf0f11", "metadata": {}, "outputs": [], "source": [ "dataset.quantify_dataset_version, dataset.tuid" ] }, { "cell_type": "markdown", "id": "ea725680", "metadata": {}, "source": [ "## Main coordinates and variables attributes\n", "\n", "Similar to the dataset attributes ({attr}`xarray.Dataset.attrs`), the main coordinates and variables have each their own required attributes attached to them as a dictionary under the {attr}`xarray.DataArray.attrs` attribute.\n", "\n", "```{eval-rst}\n", ".. autoclass:: quantify_core.data.dataset_attrs.QCoordAttrs\n", " :members:\n", " :noindex:\n", " :show-inheritance:\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "38e3e688", "metadata": {}, "outputs": [], "source": [ "dataset.amp.attrs" ] }, { "cell_type": "markdown", "id": "9d78274f", "metadata": {}, "source": [ "```{eval-rst}\n", ".. autoclass:: quantify_core.data.dataset_attrs.QVarAttrs\n", " :members:\n", " :noindex:\n", " :show-inheritance:\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "b389218d", "metadata": {}, "outputs": [], "source": [ "dataset.pop_q0.attrs" ] }, { "cell_type": "markdown", "id": "d558823d", "metadata": {}, "source": [ "## Storage format\n", "\n", "The Quantify dataset is written to disk and loaded back making use of xarray-supported facilities.\n", "Internally we write and load to/from disk using:" ] }, { "cell_type": "code", "execution_count": null, "id": "8eef4edf", "metadata": {}, "outputs": [], "source": [ "display_source_code(dh.write_dataset)\n", "display_source_code(dh.load_dataset)" ] }, { "cell_type": "markdown", "id": "f9c5a464", "metadata": {}, "source": [ "Note that we use the `h5netcdf` engine which is more permissive than the default NetCDF engine to accommodate arrays of complex numbers.\n", "\n", "```{note}\n", "Furthermore, in order to support a variety of attribute types (e.g. the `None` type) and shapes (e.g. nested dictionaries) in a seamless dataset round trip, some additional tooling is required. See source codes below that implements the two-way conversion adapter used by the functions shown above.\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "d4e70eb0", "metadata": {}, "outputs": [], "source": [ "display_source_code(dadapters.AdapterH5NetCDF)" ] } ], "metadata": { "file_format": "mystnb", "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst" } }, "kernelspec": { "display_name": "python3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }